reading-notes

About Machine Learning:


Exploratory Analysis

  1. You’ll gain valuable hints for Data Cleaning (which can make or break your models).
  2. You’ll think of ideas for Feature Engineering (which can take your models from good to great).
  3. You’ll get a “feel” for the dataset, which will help you communicate results and deliver greater impact.

Data Cleaning

  1. Fix Structural Errors : Structural errors are those that arise during measurement, data transfer, or other types of “poor housekeeping.”For instance, you can check for typos or inconsistent capitalization. This is mostly a concern for categorical features, and you can look at your bar plots to check.

  2. Filter Unwanted Outliers : Outliers can cause problems with certain types of models. For example, linear regression models are less robust to outliers than decision tree models.In general, if you have a legitimate reason to remove an outlier, it will help your model’s performance.However, outliers are innocent until proven guilty. You should never remove an outlier just because it’s a “big number.” That big number could be very informative for your model.

  3. Handle Missing Data : Missing data is a deceptively tricky issue in applied machine learning.First, just to be clear, you cannot simply ignore missing values in your dataset. You must handle them in some way for the very practical reason that most algorithms do not accept missing values.


Feature Engineering

  1. You can isolate and highlight key information, which helps your algorithms “focus” on what’s important.
  2. You can bring in your own domain expertise.
  3. Most importantly, once you understand the “vocabulary” of feature engineering, you can bring in other people’s domain expertise!

Combine Sparse Classes

Sparse classes (in categorical features) are those that have very few total observations. They can be problematic for certain machine learning algorithms, causing models to be overfit.


Algorithm Selection

  1. Linear Regression is Flawed
  2. Regularization in Machine Learning : is a technique used to prevent overfitting by artificially penalizing model coefficients.It can discourage large coefficients (by dampening them).
  3. Regularized Regression Algos : it has many types such as : Lasso Regression / Ridge Regression / Elastic-Net
  4. Decision Tree Algos : Decision trees model data as a “tree” of hierarchical branches. They make branches until they reach “leaves” that represent predictions.
  5. Tree Ensembles : Ensembles are machine learning methods for combining predictions from multiple separate models. There are a few different methods for ensembling

Model Training

About Hyperparameters

  1. Model parameters : Model parameters are learned attributes that define individual models.
  2. Hyperparameters: Hyperparameters express “higher-level” structural settings for algorithms.